The pCAZyme classifiers dbCAN, CUPP and eCAMI were independently evaluated against a high quality benchmark test set. The performances were evaluated upon the CAZyme/non-CAZyme differentiation and multilabel classification of CAZy family annotations. This notebook contains that statistical evaluation of the CAZyme classifiers.
Results summary:
- dbCAN and DIAMOND showed the strongest performances in CAZyme/non-CAZyme differentiation
- dbCAN was the strongest performing tool across all categories, Hotpep (a tool invoked by dbCAN) was the weakest
- The performances between CUPP and eCAMI were similar, although CUPP should a marginally better performance when comparing the multilabel classification of CAZy family annotations
- The performance of dbCAN may be optimised by substituting Hotpep with CUPP and/or eCAMI
The CAZyme classifiers dbCAN (Zhange et al. 2018), CUPP (Barrett and Lange, 2019) and eCAMI (Xu et al. 2019) use different methods to predict if a protein is a CAZyme or non-CAZyme, and predict the CAZy family annotations for predicted CAZymes. These classifiers have not been independently evaluated against a high quality benchmark test set.
This notebook layouts out the independent evaluation of dbCAN, CUPP and eCAMI against a high quality benchmark test set. The tools were evaluated upon their ability to differentiate between CAZymes and non-CAZymes, and their performance of predicting the CAZy family annotations of predicted CAZymes.
dbCAN incorporates the three protein function classifiers HMMER (Potter et al. 2018), Hotpep (Busk et al. 2017), and DIAMOND (Buchfink et al. 2015). In order to comprehensively evaluate the preformance of dbCAN, the predictions from HMMER, Hotpep and DIAMOND were evaluated independently of each other, and the consensus prediction (a prediction which at least two of the tools agree upon) was defined as the dbCAN result.
A single test set of 100 CAZymes and 100 non-CAZymes with the highest sequence similarity (rated by bit-score ratio) was created per genomic assembly selected to be included in the benchmark test set. Choosing the 100 non-CAZymes with the highest sequence similarity was devised to increase increase the probability of causing confusion and evaulate the tools against a difficult test set to gather a better idea of the expected performance when using the classifiers. An equal number of CAZymes to non-CAZymes was selected to prevent over representation of one population over the other.
For inclusion of a genomic assembly for the creation of a test set, the assembly had to meet of all the following criteria:
The genomic assemblies were also chosen from a range of taxonomies to provide as informative image of the performance of the classifiers over a range of datasets that users may wish to analyse.
Table ?? contains the genomic assemblies used to create the test sets for the evaluation. In total 81 assemblies were chose, 1 from an Oomycete species (more Oomycete species with greater than 100 CAZymes in CAZy could not be found), 25 fungal Ascomycetes species were selected, 13 Yeast, 2 Eukaryote microorganisms, 20 Gram positive bacteria, and 20 Gram negative bacteria.
## [1] "Mean percentage of genome incorporated in the CAZome across all test sets:"
## [1] 3.140472
## [1] "Standard deviation of the percentage of genome incorporated in the CAZome across all test sets:"
## [1] 1.174488
## [1] "Mean percentage of CAZomes incorporated in the test set across all genomes:"
## [1] 64.37203
## [1] "Standard deviation of the percentage of CAZome incorporated in the test set across all genomes:"
## [1] 25.54491
Figure 2.1: Histogram of CAZome coverage of the test sets for each respective source genomic assembly, overlayed by a box and whisker plot of the percentage of the CAZome incorproated in the test set.
The assignment of CAZy family annotations to proteins that meet specific criteria by a CAZyme classifier identifies the protein as a CAZyme. If no CAZy family annotations are assigned to a protein by a CAZyme classifier, the tool has identified the protein as a non-CAZyme. This notebook evaluates the performance of the CAZyme classifiers dbCAN (which incorporates HMMER, Hotpep and DIAMOND), CUPP and eCAMI for this binary CAZyme/non-CAZyme classification.
For every classifier-test set pair, the specificity, sensitivity, prevision, F1-score and accuracy were calculated.
The mean of each statistical parameter was calculated for each classifier across all tests, to represent the overall performance of each CAZyme classifier.
These results are presented in table 3.1.
| Classifier | Mean Specificity | Specificity Standard Deviation | Mean Sensitivity | Sensitivity Standard Deviation | Mean Precision | Precision Standard Deviation | Mean F1-score | F1-score Standard Deviation | Mean Accuracy | Accuracy Standard Deviation |
|---|---|---|---|---|---|---|---|---|---|---|
| dbCAN | 0.9869 | 0.0245 | 0.9087 | 0.1123 | 0.9866 | 0.0241 | 0.9412 | 0.0796 | 0.9478 | 0.0564 |
| HMMER | 0.9901 | 0.0163 | 0.8831 | 0.0835 | 0.9893 | 0.0174 | 0.9305 | 0.0613 | 0.9366 | 0.0422 |
| Hotpep | 0.9840 | 0.0257 | 0.8189 | 0.1327 | 0.9815 | 0.0287 | 0.8862 | 0.0917 | 0.9014 | 0.0666 |
| DIAMOND | 0.9844 | 0.0263 | 0.9261 | 0.1298 | 0.9847 | 0.0251 | 0.9481 | 0.0907 | 0.9553 | 0.0641 |
| CUPP | 0.9917 | 0.0156 | 0.8570 | 0.0825 | 0.9908 | 0.0172 | 0.9167 | 0.0531 | 0.9244 | 0.0417 |
| eCAMI | 0.9836 | 0.0257 | 0.8610 | 0.1328 | 0.9826 | 0.0254 | 0.9112 | 0.0868 | 0.9223 | 0.0647 |
Specificity is the proportion of known negatives (known non-CAZymes) which are correctly classified as negatives (non-CAZymes).
Figure 3.1 is a graphical representation of the results calculated in table 3.1.
Figure 3.1: One-dimensional scatter plot of specificity scores of CAZyme and non-CAZyme predictions per test set, overlaying box plot of standard deviation.
Sensitivity (also known as recall) is the proportion of known positives (CAZymes) that are correctly identified as positives (CAZymes).
Figure 3.2 graphically represents of the results calculated in table 3.1.
Figure 3.2: One-dimensional scatter plot of recall (sensitivity) scores of CAZyme and non-CAZyme predictions per test set, overlaying box plot of standard deviation.
Precision is the proportion of positive predictions by the classifiers that are correct.
In this case, precision represents the fraction of CAZyme predictions by the classifiers that are correct, specifically the proportion of predicted CAZymes that are known CAZymes.
Figure 3.3 is a visual representation of the results calculated in table 3.1.
Figure 3.3: One-dimensional scatter plot of precision scores of CAZyme and non-CAZyme predictions per test set, overlaying box plot of standard deviation.
The F1-score is a harmonic (or weighted) average of recall and precision and provides an idea of the overall performance of the tool, 0 being the lowest and 1 being the best performance. Figure 3.4 shows the F1-score from each test set, for each classifier.
Figure 3.4: Bar chart of specificity of CAZyme classifiers differentiation between CAZymes and non-CAZymes.
Accuarcy (calculated using (TP + TN) / (TP + TN + FP + FN) ) provides an idea of the overall performance of the classifiers as a measure of the degree to which their CAZyme/non-CAZyme predictions conforms to the correct result. Figure 3.5 is a plot of respective data from table 3.1.
Figure 3.5: Bar chart of specificity of CAZyme classifiers differentiation between CAZymes and non-CAZymes.
Below is a combination (3x2) plot of the above plots for evaluating the binary CAZyme/non-CAZyme classification performance of the CAZyme classifiers.
The statistics evaluated above provide an idea of the general performance of the tools, but they do not provide an idea of the expect range of performance. Specifically, the data does not provide a clear image of the best and worse performance a user can expect when using these tools.
To compare the expected typical range in accuracies for each classifier, 6 test sets (identified by the source genomic assemblies) were selected at random. The CAZyme/non-CAZyme predictions for each classifier, for each test set, were bootstrap resampled 100 times each, and for each bootstrap sample the accuracy calculated. The accuracies of the bootstrap samples for each classifier were plotted on stacked histograms, shown in figure 3.6.
Figure 3.6: Stacked histograms of bootstrap sample accuracies of CAZyme classifiers’ differentiation between CAZymes and non-CAZymes. 6 test sets (identified by their source genomic assembly) were selected at random. The CAZyme/non-CAZyme predictions for each classifier, for each test set, were bootstrap resampled 100 times. The accuracy of each of the 600 bootstrap samples per test set were plotted as a stacked histogram.
Few of the known non-CAZymes were classified as CAZymes by the CAZyme classifiers. The cause of the non-CAZymes being classified as CAZymes may be because of a very high sequence similarity between the non-CAZyme and known CAZymes, and/or CAZy incorrectly classifying the non-CAZyme as a CAZyme and not a CAZyme. The latter case maybe true if all 6 classifiers classify the non-CAZyme as a CAZyme.
First the equation to calcualte the correlation must be determined. Either the Pearson correlation or Spearman’s correlation coefficients could be calculated. Pearson’s correlation assumes the data is normally distributed, Spearman’s correlation is a nonparametric statistic becuase it does not presume a normal distribution. Figure @ref(fig:fp.cor)[A] shows a histogram of the number of prediciton tools that generated false positives and figure @ref(fig:fp.cor)[B] a boxplot of the highest BLAST score ratios of false positives against CAZymes in the same test set. Both plots demonstrate the data is not normally distributed. Additionally, figure @ref(fig:fp.cor)[B] shows the data contains outliers, which Pearson’s correlation is also sensitive to. Therefore, the Spearman’s correlation coefficient was calculated, which is shown in @ref(fig:fp.cor)[C].
Overall, all tools show a low probability of producing false positives (misclassifying a non-CAZyme as a CAZyme), and few of the positive predictions are false positives. Therefore, we can be confident in that the CAZyme predictions made by each of these tools are most likely (typically 90-100%) correct. However, all the classifiers demonstrated a consistent behaviour to not identify all CAZymes within a CAZome. Therefore, we can be confident in the CAZyme predictions, but should not presume all non-CAZyme predictions are correct; these classifiers are unlikely to identify the complete CAZome although a near-complete CAZome will be accuracetly identified.
dbCAN consistently demonstrated the strongest performance in all catagories, inferring that eCAMI and CUPP are not suitable replacements of the CAZyme classifier. Hotpep consistently demonstrated the weakest performance, and is incorporated within dbCAN. Therefore, substituting eCAMI and/or CUPP into dbCAN instead of Hotpep may futher improve the performance of dbCAN. The new k-mer based methods, eCAMI and CUPP demonstrated similar performances. CUPP showed a more consistent performance and eCAMI demonstrating a greater range in performance although its mean performance was fractionally greater than that of CUPP. However, more bootstrap calculated accuracy scores feel within the range of 0.9-1.0 for CUPP than eCAMI. This infers that a CUPP may typically provide a better performance than eCAMI, although eCAMI does have the potential on some occations to out perform CUPP, depending on the test set.
CAZy groups CAZymes into CAZy families by sequence similarity, and CAZy families are grouped into one of 6 functional classes. The CAZyme classifiers predict the CAZy family annotations of predicted CAZymes, but it is of interest to see what the level of performance of the classiferis is at the CAZy class level. Specifically, a classifier may struggle to predict the correct CAZy class for a CAZyme but consistently predict the correct CAZy class. Therefore, the aim of this part of the evaluation is to evaluate the performance of the classifiers to predict the correct CAZy class of predict CAZymes.
Below is a table summary all statistical parameters calculated in order to evaluate the performance of the CAZy class classification for each prediction tool across all CAZy classes.
| Classifier | Mean Specificity | Specificity Standard Deviation | Mean Sensitivity | Sensitivity Standard Deviation | Mean Precision | Precision Standard Deviation | Mean F1-score | F1-score Standard Deviation | Mean Accuracy | Accuracy Standard Deviation |
|---|---|---|---|---|---|---|---|---|---|---|
| dbCAN | 0.9962 | 0.0121 | 0.8928 | 0.1721 | 0.9626 | 0.1298 | 0.9170 | 0.1460 | 0.9755 | 0.0429 |
| HMMER | 0.9966 | 0.0103 | 0.8270 | 0.2407 | 0.9612 | 0.1388 | 0.8675 | 0.2013 | 0.9686 | 0.0392 |
| Hotpep | 0.9749 | 0.0471 | 0.8317 | 0.2120 | 0.8576 | 0.2495 | 0.8207 | 0.2116 | 0.9421 | 0.0673 |
| DIAMOND | 0.9956 | 0.0130 | 0.9078 | 0.1960 | 0.9578 | 0.1526 | 0.9213 | 0.1725 | 0.9816 | 0.0426 |
| CUPP | 0.9975 | 0.0097 | 0.7118 | 0.3888 | 0.7695 | 0.4098 | 0.7343 | 0.3937 | 0.9554 | 0.0635 |
| eCAMI | 0.9852 | 0.0324 | 0.8362 | 0.2157 | 0.8966 | 0.2066 | 0.8487 | 0.1950 | 0.9590 | 0.0536 |
Below a proportional area plot representing the F-beta score for each CAZyme classifier for each test set is generated. each square is sized proportional to the relative sample size. Every class was not included in every sample, resulting in different sample sizes between CAZy classes, the same between classifiers.
A dataframe of the number of test sets containing each CAZy class is generated.
## Prediction_tool GH GT PL CE AA CBM
## 1 dbCAN 70 70 38 67 37 70
## 2 HMMER 70 70 38 67 37 70
## 3 DIAMOND 70 70 38 67 37 70
## 4 Hotpep 70 70 38 67 37 70
## 5 CUPP 70 70 38 67 37 70
## 6 eCAMI 70 70 39 67 37 70
The sensitivity of each CAZyme classifier can be plotted against the specificity for each CAZy class, however plotting all CAZy classes in a single plot produces an overally cramped plot, unless very few test sets were used.
Below the prediction sensitivity is plotted against the specificity for each classifier, and a separate plot is generated for each CAZy class.
The scatter plots of sensitivity against specificity overlay a coloured contour to highlight the distribution of the points. When too many points have the same value a contour cannot be generated. In order to plot a contour noise is added to the data. The original data is used to plot the scatter plot and the data with added noise is used to plot the contour.
The percentage of the data points which need noise to be added to them in order to generate a contour varies from data set to data set. To change the percentage of the data points with noise added to them, change the third value of call to the function plot.class.sens.vs.spec(), which is used to generate the plots. The third value is the percentage of data points to add noise to, written in decimal form.
## png
## 2
## png
## 2
## png
## 2
## png
## 2
## png
## 2
## png
## 2
A single CAZyme can be included in multiple CAZy classes leading to the multilabel classification of CAZymes. To address this and evaluate the multilabel classification of CAZy classes the Rand Index (RI) and Adjusted Rand Index (ARI) were calculated.
The RI is the measure of accuracy across all potential classifications of a protein. The RI ranges from 0 (no correct annotations) to 1 (all annotations correct). The ARI is the RI adjusted for chance, where 0 is the equivalent to assigning the CAZy class annotations randomly, -1 where the annotations are systematically handed out incorrectly and 1 where the annotations are all correct.
| Prediction_tool | Mean | Standard Deviation |
|---|---|---|
| dbCAN | 0.9398 | 0.2359 |
| HMMER | 0.9268 | 0.2537 |
| DIAMOND | 0.9545 | 0.2079 |
| Hotpep | 0.8706 | 0.3212 |
| CUPP | 0.9007 | 0.2852 |
| eCAMI | 0.9060 | 0.2836 |
Plot are violin plots underlying scatter plots, presenting the RI and ARI for every protein across all test sets.
Figure 4.1: Violin plot of Rand Index (RI) of performance of the CAZyme classifiers to predict the multilabel classification of CAZy classes.
Figure 4.2: Violin plot of Adjusted Rand Index (ARI) of performance of the CAZyme classifiers to predict the multilabel classification of CAZy classes.
The following section evaluates the performance of the CAZyme classifiers to predict CAZy family classifications.
Below is a table summarising the overall CAZy family classifications for each test set across all CAZy families.
| Classifier | Mean Specificity | Specificity Standard Deviation | Mean Sensitivity | Sensitivity Standard Deviation | Mean Precision | Precision Standard Deviation | Mean F1-score | F1-score Standard Deviation | Mean Accuracy | Accuracy Standard Deviation |
|---|---|---|---|---|---|---|---|---|---|---|
| dbCAN | 0.9999 | 3e-04 | 0.8874 | 0.2417 | 0.9309 | 0.2275 | 0.8997 | 0.2349 | 0.9995 | 0.0014 |
| HMMER | 0.9999 | 3e-04 | 0.8703 | 0.2814 | 0.8861 | 0.2791 | 0.8640 | 0.2781 | 0.9994 | 0.0022 |
| Hotpep | 0.9994 | 2e-03 | 0.7621 | 0.3347 | 0.7661 | 0.3771 | 0.7305 | 0.3504 | 0.9987 | 0.0034 |
| DIAMOND | 0.9999 | 3e-04 | 0.8927 | 0.2386 | 0.9268 | 0.2257 | 0.9025 | 0.2323 | 0.9997 | 0.0008 |
| CUPP | 1.0000 | 2e-04 | 0.6582 | 0.4360 | 0.7048 | 0.4458 | 0.6723 | 0.4354 | 0.9992 | 0.0023 |
| eCAMI | 0.9997 | 9e-04 | 0.7356 | 0.3412 | 0.7791 | 0.3671 | 0.7372 | 0.3437 | 0.9992 | 0.0016 |
The evaluate the overall performance of each classifier, for each CAZy family, the F1-score was calculated for every family. Families were grouped by their parent CAZy class and the distribution of the F1-scores is shown in figure 5.1.
Figure 5.1: Proportaional area plot of F1-score per CAZy distribution per CAZy class.
5.1 Below is a table displaying the number of test sets in which each CAZy class was present, and were used to draw the proporitonal areas for each class in figure5.1.
## Prediction_tool GH GT PL CE AA CBM
## 1 dbCAN 124 70 22 16 14 50
## 2 HMMER 126 72 22 16 14 51
## 3 DIAMOND 124 70 22 16 14 50
## 4 Hotpep 125 70 22 16 14 65
## 5 CUPP 124 70 22 16 14 50
## 6 eCAMI 124 70 22 16 14 61
To evaluate the performance of predicting each CAZy family independent of all other CAZy families, the sensitivity and precision for each CAZy family, for each CAZyme classifier was calculated and plotted against each other (Fig.??). Whereas sensitivity was plotted against sensitivity for CAZy classes, owing to the extremely small variation in specificity scores, sensitivity was plotted as a percentage against log10 of the specificity percentage.
Later on in this report the sensitivity for each CAZy family is plotted against specificity, as was done with CAZy class. However, owing to extremely small different in specificity, with no tool producing a specificity less than 0.995 it is extremely difficult to separate performance by specificity, so a boxplot and scatter plot for each is plotted. Each point represents one test set, and test sets are grouped by CAZyme classifier and facet wrapped by the parent CAZy class.
Figure 5.2: Scatter plot of overlaying a one-dimensional box-and-whisker plot of sensitivity for each CAZy family for each CAZyme classifier. Each CAZy family is represented as a single point on the plot.
For better resolution we can group the CAZy families by their parent CAzy classes, and compare the performances of the tools CAZy class, by CAZy class. Owing to the minimal variation in specificity scores, specificity was plotted as the percentage specificity log10.
Figure 5.3 shows the plotting of sensitivity against specificity for each Glycoside Hydrolase CAZy family.
Figure 5.3: Scatter plot of recall (sensitivity) against specificity for predicting each CAZy family for each CAZyme classifier in the CAZy class Glycoside Hydrolases. Each GH CAZy family is represented as a single point on the plot.
Figure 5.4 shows the plotting of sensitivity against specificity for each Glycosyltransferases CAZy family.
Figure 5.4: Scatter plot of recall (sensitivity) against specificity for predicting each CAZy family for each CAZyme classifier in the CAZy class Glycosyltransferases. Each GT CAZy family is represented as a single point on the plot.
Figure 5.3 shows the plotting of sensitivity against specificity for each Polysaccharide Lyases CAZy family.
Figure 5.5: Scatter plot of recall (sensitivity) against specificity for predicting each CAZy family for each CAZyme classifier in the CAZy class Polysaccharide Lyases. Each PL CAZy family is represented as a single point on the plot.
Figure 5.6 shows the plotting of sensitivity against specificity for each Carbohydrate Esterases CAZy family.
Figure 5.6: Scatter plot of recall (sensitivity) against specificity for predicting each CAZy family for each CAZyme classifier in the CAZy class Carbohydrate Esterases. Each CE CAZy family is represented as a single point on the plot.
Figure ?? shows the plotting of sensitivity against specificity for each Auxillary Activities CAZy family.
Figure 5.7: Scatter plot of recall (sensitivity) against specificity for predicting each CAZy family for each CAZyme classifier in the CAZy class Auxillary Activities. Each AA CAZy family is represented as a single point on the plot.
Figure 5.8 shows the plotting of sensitivity against specificity for each Carbohydrate Binding Module CAZy family.
Figure 5.8: Scatter plot of recall (sensitivity) against specificity for predicting each CAZy family for each CAZyme classifier in the CAZy class Carbohydrate Binding Modules. Each CBM CAZy family is represented as a single point on the plot.
We then pulled out the CAZy families with a sensitivity score less than 0.75, which there were only 10.
CAZy annotates proteins in a domain-wise manner. Consequently, a single protein may be assigned to multiple CAZy families. The ability of a classifier to assign all the correct CAZy family annotations for a given protein when only evaluating the CAZy family classification performance per CAZy family, independently of all other CAZy classes.
The CAZy family multi-label classification performance is represented by the Rand Index (RI) and Adjusted Rand Index (ARI). The RI is a quantitive measure of similarity between two clusterings by considering all pairs of samples and counting pairs that are assigned in the same or different clusters in the predicted and true clusterings. In this case the two clusters are the predicted and groud truth CAZy family annotations. The raw RI score is then “adjusted for chance” into the ARI score using the following scheme:
ARI = (RI - Expected_RI) / (max(RI) - Expected_RI)
This produces a score between 1 and -1. A score of 1 is produced if all predicted and known CAZy family annotations are identical, 0 if completely random clustering of -1 if systematically incorrect clustering and the number of incorrect classifications of proteins is greater than would be expected from randomly annotating proteins with CAZy families.
| Prediction_tool | Mean | Standard Deviation |
|---|---|---|
| dbCAN | 0.9997 | 0.0011 |
| HMMER | 0.9996 | 0.0014 |
| DIAMOND | 0.9998 | 0.0010 |
| Hotpep | 0.9991 | 0.0023 |
| CUPP | 0.9995 | 0.0015 |
| eCAMI | 0.9994 | 0.0017 |
| Prediction_tool | Mean | Standard Deviation |
|---|---|---|
| dbCAN | 0.9391 | 0.2359 |
| HMMER | 0.9250 | 0.2554 |
| DIAMOND | 0.9530 | 0.2105 |
| Hotpep | 0.8758 | 0.3083 |
| CUPP | 0.9098 | 0.2712 |
| eCAMI | 0.9077 | 0.2778 |
Multilabel classification raises when a single instance can be assinged to multiple classes. In this evaluation a single instance is a protein and the classes are CAZy families, a single CAZyme can be assigned to multiple CAZy families. This is important to take into consideration because the same approaches for statistical evaluation of binary classification provided a limited view of the performance of the classifiers when applied to multilabel classification.
Plot are violin plots overlayed by scatter plots of the Rand Index and Adjusted Rand Index for every protein in every test set, excluding true negatives.
Figure 5.9: Violin plot of Rand Index (RI) of performance of the CAZyme classifiers to predict the multilabel classification of CAZy classes.
Figure 5.10: Violin plot of Adjusted Rand Index (ARI) of performance of the CAZyme classifiers to predict the multilabel classification of CAZy classes.
The performance for a classifier per taxonomy group may vary. For this evaluation the test sets were separated into the taxonomy groups: - Bacteria - Eukaryote
The evaluation per classifier per taxonomy group, versus all test sets pooled together was evaluated.
Here we calculate the mean plus and minus the standard deviation of the F1-score of each prediction tool for each taxonomy group, to represent the overall performance per taxonomy group.
| Prediction_tool | Bacterial Mean | Bacterial Standard Deviation | Eukaryote Mean | Eukaryote Standard Deviation | All Mean | All Standard Deviation |
|---|---|---|---|---|---|---|
| CUPP | 0.9217 | 0.0522 | 0.9103 | 0.0545 | 0.9167 | 0.0531 |
| dbCAN | 0.9434 | 0.0782 | 0.9385 | 0.0826 | 0.9412 | 0.0796 |
| DIAMOND | 0.9481 | 0.0919 | 0.9480 | 0.0908 | 0.9481 | 0.0907 |
| eCAMI | 0.9270 | 0.0763 | 0.8913 | 0.0960 | 0.9112 | 0.0868 |
| HMMER | 0.9210 | 0.0791 | 0.9425 | 0.0215 | 0.9305 | 0.0613 |
| Hotpep | 0.8898 | 0.0774 | 0.8817 | 0.1083 | 0.8862 | 0.0917 |
Figure 6.1: One dimensional scatter plot overlaying a box and whisker plot of the specificity of binary classification per CAZyme/non-CAZyme classifier per taxonomy group. Each point represents the score from one test set.
Figure 6.2: One dimensional scatter plot overlaying a box and whisker plot of the sensitivity of binary classification per CAZyme/non-CAZyme classifier per taxonomy group. Each point represents the score from one test set.
Figure 6.3: One dimensional scatter plot overlaying a box and whisker plot of the precision of binary classification per CAZyme/non-CAZyme classifier per taxonomy group. Each point represents the score from one test set.
Figure 6.4: One dimensional scatter plot overlaying a box and whisker plot of the F1-score of binary classification per CAZyme/non-CAZyme classifier per taxonomy group. Each point represents the score from one test set.
Figure 6.5: One dimensional scatter plot overlaying a box and whisker plot of the accuracy of binary classification per CAZyme/non-CAZyme classifier per taxonomy group. Each point represents the score from one test set.
Below a table containing the mean F1-score plus/minus standard deviation for per CAZyme classifier per taxonomy group is presented, in order to represent the overall performance per CAZyme classifier per taxonomy group for all CAZy class classification.
| Prediction_tool | Bacterial Mean | Bacterial Standard Deviation |
|---|---|---|
| CUPP | 0.9217 | 0.0522 |
| dbCAN | 0.9434 | 0.0782 |
| DIAMOND | 0.9481 | 0.0919 |
| eCAMI | 0.9270 | 0.0763 |
| HMMER | 0.9210 | 0.0791 |
| Hotpep | 0.8898 | 0.0774 |
To represent the overall CAZy class classification performance, and take into consideration of CAZy class multi-label classification, the Rand Index was caluated for each taxonomy group per CAZy classifier.
| Prediction_tool | Bacteria Mean | Bacteria Standard Deviation | Eukaryote Mean | Eukaryote Standard Deviation | All Mean | All Standard Deviation |
|---|---|---|---|---|---|---|
| CUPP | 0.9615 | 0.1074 | 0.9637 | 0.1044 | 0.9625 | 0.1061 |
| dbCAN | 0.9802 | 0.0794 | 0.9781 | 0.0832 | 0.9793 | 0.0811 |
| DIAMOND | 0.9845 | 0.0711 | 0.9844 | 0.0710 | 0.9845 | 0.0711 |
| eCAMI | 0.9674 | 0.1008 | 0.9630 | 0.1064 | 0.9655 | 0.1034 |
| HMMER | 0.9725 | 0.0926 | 0.9750 | 0.0884 | 0.9736 | 0.0908 |
| Hotpep | 0.9495 | 0.1217 | 0.9533 | 0.1170 | 0.9512 | 0.1197 |
The Adjusted Rand Index was also calculated in order to take into consideration chance.
| Prediction_tool | Bacteria Mean | Bacteria Standard Deviation | Eukaryote Mean | Eukaryote Standard Deviation | All Mean | All Standard Deviation |
|---|---|---|---|---|---|---|
| CUPP | 0.9004 | 0.2829 | 0.9011 | 0.2880 | 0.9007 | 0.2852 |
| dbCAN | 0.9427 | 0.2304 | 0.9361 | 0.2426 | 0.9398 | 0.2359 |
| DIAMOND | 0.9546 | 0.2078 | 0.9543 | 0.2080 | 0.9545 | 0.2079 |
| eCAMI | 0.9140 | 0.2691 | 0.8958 | 0.3006 | 0.9060 | 0.2836 |
| HMMER | 0.9225 | 0.2622 | 0.9322 | 0.2425 | 0.9268 | 0.2537 |
| Hotpep | 0.8681 | 0.3222 | 0.8739 | 0.3201 | 0.8706 | 0.3212 |
| Prediction_tool | Bacteria Mean | Bacteria Standard Deviation | Eukaryote Mean | Eukaryote Standard Deviation | All Mean | All Standard Deviation |
|---|---|---|---|---|---|---|
| CUPP | 0.9994 | 0.0016 | 0.9995 | 0.0015 | 0.9995 | 0.0015 |
| dbCAN | 0.9997 | 0.0011 | 0.9997 | 0.0012 | 0.9997 | 0.0011 |
| DIAMOND | 0.9998 | 0.0010 | 0.9998 | 0.0010 | 0.9998 | 0.0010 |
| eCAMI | 0.9994 | 0.0018 | 0.9995 | 0.0015 | 0.9994 | 0.0017 |
| HMMER | 0.9996 | 0.0014 | 0.9996 | 0.0014 | 0.9996 | 0.0014 |
| Hotpep | 0.9990 | 0.0025 | 0.9993 | 0.0019 | 0.9991 | 0.0023 |
| Prediction_tool | Bacteria Mean | Bacteria Standard Deviation | Eukaryote Mean | Eukaryote Standard Deviation | All Mean | All Standard Deviation |
|---|---|---|---|---|---|---|
| CUPP | 0.9118 | 0.2654 | 0.9073 | 0.2782 | 0.9098 | 0.2712 |
| dbCAN | 0.9420 | 0.2307 | 0.9354 | 0.2422 | 0.9391 | 0.2359 |
| DIAMOND | 0.9529 | 0.2104 | 0.9531 | 0.2105 | 0.9530 | 0.2105 |
| eCAMI | 0.9148 | 0.2621 | 0.8988 | 0.2961 | 0.9077 | 0.2778 |
| HMMER | 0.9201 | 0.2647 | 0.9311 | 0.2430 | 0.9250 | 0.2554 |
| Hotpep | 0.8715 | 0.3087 | 0.8812 | 0.3077 | 0.8758 | 0.3083 |
Overall, all CAZyme classifiers showed strong performances at all three levels of CAZyme classification (CAZyme/non-CAZyme. CAZy class and CAZy family).
Performance was extremely strong for CAZyme classifiers for across all levels of CAZyme classification, performance in CAZyme classifiers varied most greatly for sensitivity.
In general, the CAZyme/non-CAZyme, CAZy class and CAZy family classifications were accurate for all CAZyme classifiers (i.e. when a classification is predicted it was frequently correct). however, the CAZyme classifiers do not predict a comprehensive CAZome. CAZyme classifiers performance differed most greatly by sensitivity, which indicated an non-comprehensive annotation of the CAZome, CAZy class members and CAZy family members.
Classifying Bacterial or Eukaryote had neglebialbe impact on the performance of the CAZyme classification at at every level of classification (CAZyme/non-CAZyme. CAZy class and CAZy family).